Opportunistic Strategies for Generalized No-Regret Problems

نویسندگان

  • Andrey Bernstein
  • Shie Mannor
  • Nahum Shimkin
چکیده

This paper considers a generalized no-regret problem with vector-valued rewards, defined in terms of a desired reward set of the agent. For each mixed action q of the opponent, the agent has a set R∗(q) where the average reward should reside. In addition, the agent has a response mixed action p which brings the expected reward under these two actions, r(p, q), to R∗(q). If a strategy of the agent ensures that the average reward converges to R∗(q̄n), where q̄n is the empirical distribution of the opponent’s actions, for any strategy of the opponent, we say that it is a no-regret strategy with respect to R∗(q). When the multifunction q 7→ R∗(q) is convex, as is the case in the standard no-regret problem, noregret strategies can be devised. Our main interest in this paper is in cases where this convexity property does not hold. The best that can be guaranteed in general then is the convergence of the average reward to Rc(q̄n), the convex hull of R ∗(q̄n). However, as the game unfolds, it may turn out that the opponent’s choices of actions are limited in some way. If these restrictions were known in advance, the agent could possibly ensure convergence of the average reward to some desired subset of Rc(q̄n), or even approach R∗(q̄n) itself. We formulate appropriate goals for opportunistic no-regret strategies, in the sense that they may exploit such limitations on the opponent’s action sequence in an on-line manner, without knowing them beforehand. As the main technical tool, we propose a class of approachability algorithms that rely on a calibrated forecast of the opponent’s actions, which are opportunistic in the above mentioned sense. As an application, we consider the online no-regret problem with average cost constraints, introduced in Mannor, Tsitsiklis, and Yu (2009). We show, in particular, that our algorithm does attain the best-responsein-hindsight for this problem if the opponent’s play happens to be stationary, or close to stationary in a certain sense.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Opportunistic Approachability and Generalized No-Regret Problems

Blackwells theory of approachability, introduced in 1956, has since proved a useful tool in the study of a range of repeated multi-agent decision problems. Given a repeated matrix game with vector payoffs, a target set S is approachable by a certain player if he can ensure that the average payoff vector converges to that set, for any strategy of the opponent. In this paper we consider the case ...

متن کامل

On Fixed Convex Combinations of No-Regret Learners

No-regret algorithms are powerful tools for learning in online convex problems that have received increased attention in recent years. Considering affine and external regret, we investigate what happens when a set of no-regret learners (voters) merge their respective strategies in each learning iteration to a single, common one in form of a convex combination. We show that an agent who executes...

متن کامل

Minimax regret based elicitation of generalized additive utilities

We describe the semantic foundations for elicitation of generalized additively independent (GAI) utilities using the minimax regret criterion, and propose several new query types and strategies for this purpose. Computational feasibility is obtained by exploiting the local GAI structure in the model. Our results provide a practical approach for implementing preference-based constrained configur...

متن کامل

Opportunistic Approachability: Calibration-based Algorithms, with Application to Constrained No-Regret

Blackwell’s approachability theory has played a key role in the theory of learning in games, as well as in the analysis of on-line no-regret algorithms. Given a repeated matrix game with vector payoffs, a target set S is approachable by a designated player if he can ensure that the average payoff vector converges to that set, for any strategy of the opponent. Hence, the notion of approachabilit...

متن کامل

A General Class of No-Regret Learning Algorithms and Game-Theoretic Equilibria

A general class of no-regret learning algorithms, called no-Φ-regret learning algorithms, is defined which spans the spectrum from no-external-regret learning to no-internal-regret learning and beyond. The set Φ describes the set of strategies to which the play of a given learning algorithm is compared. A learning algorithm satisfies no-Φ-regret if no regret is experienced for playing as the al...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013